Here is an example of possible answers to the practical on fitting the deterministic SEITL model to the Tristan da Cunha outbreak.

Each section below correspond to a section of the practical. Thus, you can have a look at our example for one section and then go back to the practical to answer the following sections.

Although our example refers to the SEITL model, the same commands work for the SEIT4L model (i.e. data(SEIT4L_deter) instead of data(SEITL_deter)).

Setting the MCMC

# the fitmodel
data(SEITL_deter)

# wrapper for posterior
my_posteriorTdC <- function(theta) {
    
    my_fitmodel <- SEITL_deter
    my_init.state <- c(S = 279, E = 0, I = 2, T = 3, L = 0, Inc = 0)
    # note that for the SEIT4L model there are 4 state variables for the T
    # compartment my_init.state <- c('S' = 279, 'E' = 0, 'I' = 2, 'T1' = 3, 'T2'
    # = 0, 'T3' = 0, 'T4' = 0, 'L' = 0, 'Inc' = 0)
    
    return(logPosterior(fitmodel = my_fitmodel, theta = theta, init.state = my_init.state, 
        data = FluTdC1971, margLogLike = dTrajObs, log = TRUE))
    
}

# theta to initialise the MCMC
init.theta <- c(R0 = 2, D_lat = 2, D_inf = 2, alpha = 0.8, D_imm = 16, rho = 0.85)

# diagonal elements of the covariance matrix for the Gaussian proposal
proposal.sd <- c(R0 = 1, D_lat = 0.5, D_inf = 0.5, alpha = 0.1, D_imm = 2, rho = 0.1)

# lower and upper limits of each parameter
lower <- c(R0 = 0, D_lat = 0, D_inf = 0, alpha = 0, D_imm = 0, rho = 0)
upper <- c(R0 = Inf, D_lat = Inf, D_inf = Inf, alpha = 1, D_imm = Inf, rho = 1)

# number of iterations for the MCMC
n.iterations <- 5000

# additional parameters for the adaptive MCMC, see ?mcmcMH for more details
adapt.size.start <- 100
adapt.size.cooling <- 0.999
adapt.shape.start <- 200

You can now go back to the practical and try to run MCMC with those settings.

Run MCMC

If you didn't manage to run MCMC, or it took too long to obtain a few thousand iterations, you can load our short run as follows:

data(mcmc_TdC_deter_shortRun)
# this should load 2 objects in your environment: mcmc_SEITL and
# mcmc_SEIT4L. Each one is a list of 3 elements returned by mcmcMH
names(mcmc_SEITL)
## [1] "trace"            "acceptance.rate"  "covmat.empirical"
# the trace contains 9 variables for 5000 iterations
dim(mcmc_SEITL$trace)
## [1] 5001    9
# let's have a look at it
head(mcmc_SEITL$trace)
##         R0    D_lat     D_inf     alpha    D_imm       rho log.prior
## 1 2.000000 2.000000 2.0000000 0.8000000 16.00000 0.8500000 -12.81448
## 2 2.000000 2.000000 2.0000000 0.8000000 16.00000 0.8500000 -12.81448
## 3 2.078431 2.479177 1.3236209 0.7551034 15.35135 0.9016641 -12.81448
## 4 2.078431 2.479177 1.3236209 0.7551034 15.35135 0.9016641 -12.81448
## 5 2.820058 2.261659 1.5136383 0.6891596 16.45201 0.7988818 -12.81448
## 6 3.846204 2.449146 0.3711098 0.6083465 15.82348 0.6543921 -12.81448
##   log.likelihood log.posterior
## 1      -445.7795     -458.5939
## 2      -445.7795     -458.5939
## 3      -418.1840     -430.9984
## 4      -418.1840     -430.9984
## 5      -391.4595     -404.2740
## 6      -283.3309     -296.1454

You can now go back to the practical and analyse this trace.

Short run analysis

Here is an example of analysis for our preliminary run:

# convert to a mcmc object for coda
trace <- mcmc(mcmc_SEITL$trace)

# compute the acceptance rate
1 - rejectionRate(trace)
##             R0          D_lat          D_inf          alpha          D_imm 
##         0.1704         0.1704         0.1704         0.1704         0.1704 
##            rho      log.prior log.likelihood  log.posterior 
##         0.1704         0.0000         0.1704         0.1704
# between 0.1 and 0.6: looks good!

# let's have a look at the traces
xyplot(x = trace)

Although the chain was started at a init.theta with a low posterior density, it quickly finds the region of the parameter space with high posterior density. Note also the constant trace of the log-prior since we have assumed a uniform prior.

Overall, it looks like the chain reached its target distribution after 1000 steps.

# Let's find a suitable burning:
plotESSBurn(trace)

As anticipated from the trace, burning the first 1000 iterations maximizes the effective sample size (ESS).

# Let's create a new trace without the burning
trace.burn <- burnAndThin(trace, burn = 1000)
xyplot(x = trace.burn)

# Let's check the ESS
effectiveSize(trace.burn)
##             R0          D_lat          D_inf          alpha          D_imm 
##      172.59406       68.90601      177.05769      122.62280      101.66500 
##            rho      log.prior log.likelihood  log.posterior 
##      121.77874        0.00000      139.10356      139.10356

Although we have 4000 samples remaining after burning, the ESS is much smaller. This is due to autocorrelation of the chain.

# autocorrelation plot
acfplot(x = trace.burn, lag.max = 60)

The autocorrelation between samples drops substantially for a lag of 20 iterations. We can thin the trace to reduce the autocorrelation.

# Let's create a thinned trace
trace.burn.thin <- burnAndThin(trace.burn, thin = 20)
xyplot(x = trace.burn.thin)

# Let's check the ESS
effectiveSize(trace.burn.thin)
##             R0          D_lat          D_inf          alpha          D_imm 
##      131.26654       66.87189      191.00000       86.83456      112.46448 
##            rho      log.prior log.likelihood  log.posterior 
##       96.40768        0.00000      137.34214      137.34214

Although the thinned trace has 20 times less fewer than the unthinned trace, it has a similar ESS. This is because the autocorrelation has been reduced.

# new autocorrelation plot
acfplot(x = trace.burn.thin, lag.max = 60)

Let's compare the posterior estimates of the thinned and unthinned traces.

# The unthinned trace
summary(trace.burn)
## 
## Iterations = 1:4001
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 4001 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                     Mean      SD  Naive SE Time-series SE
## R0               17.2400 3.86076 0.0610363       0.293873
## D_lat             1.2168 0.33682 0.0053249       0.040576
## D_inf            10.5268 2.27555 0.0359751       0.171013
## alpha             0.5394 0.03628 0.0005736       0.003276
## D_imm             8.0693 1.97092 0.0311591       0.195471
## rho               0.6998 0.04898 0.0007744       0.004439
## log.prior       -12.8145 0.00000 0.0000000       0.000000
## log.likelihood -141.2841 1.68954 0.0267106       0.143252
## log.posterior  -154.0985 1.68954 0.0267106       0.143252
## 
## 2. Quantiles for each variable:
## 
##                     2.5%       25%       50%      75%     97.5%
## R0                9.8968   14.4880   16.8702   19.586   24.8133
## D_lat             0.5709    0.9951    1.1892    1.423    1.9408
## D_inf             5.9743    8.8864   10.6053   12.226   14.4630
## alpha             0.4677    0.5150    0.5406    0.565    0.6044
## D_imm             4.4876    6.5216    7.8916    9.520   12.0038
## rho               0.6082    0.6674    0.6995    0.735    0.7919
## log.prior       -12.8145  -12.8145  -12.8145  -12.814  -12.8145
## log.likelihood -145.3254 -142.3281 -141.0061 -139.928 -138.9663
## log.posterior  -158.1398 -155.1425 -153.8206 -152.743 -151.7808

# The thinned trace
summary(trace.burn.thin)
## 
## Iterations = 1:191
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 191 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                     Mean      SD Naive SE Time-series SE
## R0               17.3988 3.85418 0.278878       0.336399
## D_lat             1.2195 0.33718 0.024398       0.041233
## D_inf            10.6031 2.24883 0.162720       0.162720
## alpha             0.5389 0.03511 0.002541       0.003768
## D_imm             8.0310 1.98373 0.143538       0.187058
## rho               0.6987 0.04730 0.003423       0.004818
## log.prior       -12.8145 0.00000 0.000000       0.000000
## log.likelihood -141.2704 1.69755 0.122830       0.144851
## log.posterior  -154.0848 1.69755 0.122830       0.144851
## 
## 2. Quantiles for each variable:
## 
##                     2.5%       25%       50%       75%     97.5%
## R0               10.0298   14.6419   17.1295   19.6005   24.7571
## D_lat             0.5684    1.0116    1.1961    1.4486    1.9094
## D_inf             6.3118    9.1569   10.5184   12.4720   14.2501
## alpha             0.4685    0.5176    0.5402    0.5620    0.5992
## D_imm             4.4508    6.5154    7.8623    9.5462   11.7747
## rho               0.6094    0.6675    0.7011    0.7334    0.7807
## log.prior       -12.8145  -12.8145  -12.8145  -12.8145  -12.8145
## log.likelihood -145.3972 -142.1858 -140.9833 -139.8503 -138.9644
## log.posterior  -158.2117 -155.0003 -153.7978 -152.6648 -151.7789

They are very similar. So why thin? Because autocorrelation produces clumpy samples that are unrepresentative, in the short run, of the true underlying posterior distribution. We can check this by comparing the thinned and unthinned distributions using the function plotPosteriorDensity of the fitR package:

plotPosteriorDensity(list(unthinned = trace.burn, thinned = trace.burn.thin))

The thinned trace shows a smoother distribution despite having less samples than the unthinned one. This because the local "bumps" of the unthinned distribution are caused by autocorrelated samples.

You can now go back to the practical and perform a similar analysis for a long-run MCMC.

Long run analysis

Here is an example of an analysis for our long run (\(10^5\) iterations)

# load mcmc output
data(mcmc_TdC_deter_longRun)

# create mcmc objects for both traces
trace1 <- mcmc(mcmc_SEITL_theta1$trace)
trace2 <- mcmc(mcmc_SEITL_theta2$trace)

# combine traces as mcmc.list object
trace <- mcmc.list(list(trace1, trace2))

# let's have a look
head(trace, 3)
## [[1]]
## Markov Chain Monte Carlo (MCMC) output:
## Start = 1 
## End = 4 
## Thinning interval = 1 
##         R0    D_lat    D_inf    alpha    D_imm       rho log.prior
## 1 2.000000 2.000000 2.000000 0.800000 16.00000 0.8500000 -12.81448
## 2 2.000000 2.000000 2.000000 0.800000 16.00000 0.8500000 -12.81448
## 3 3.291068 1.927748 2.139757 0.824364 19.14906 0.8684192 -12.81448
## 4 3.644072 1.868155 1.899170 0.809881 18.47234 0.8769007 -12.81448
##   log.likelihood log.posterior
## 1      -445.7795     -458.5939
## 2      -445.7795     -458.5939
## 3      -442.6925     -455.5070
## 4      -438.2018     -451.0162
## 
## [[2]]
## Markov Chain Monte Carlo (MCMC) output:
## Start = 1 
## End = 4 
## Thinning interval = 1 
##         R0   D_lat    D_inf      alpha    D_imm       rho log.prior
## 1 20.00000 2.00000 2.000000 0.10000000 8.000000 0.3000000 -12.81448
## 2 20.00000 2.00000 2.000000 0.10000000 8.000000 0.3000000 -12.81448
## 3 18.71727 2.70723 1.014131 0.07367196 9.405255 0.2468698 -12.81448
## 4 18.71727 2.70723 1.014131 0.07367196 9.405255 0.2468698 -12.81448
##   log.likelihood log.posterior
## 1      -321.5801     -334.3946
## 2      -321.5801     -334.3946
## 3      -320.1123     -332.9268
## 4      -320.1123     -332.9268
## 
## attr(,"class")
## [1] "mcmc.list"

# acceptance rate
1 - rejectionRate(trace)
##             R0          D_lat          D_inf          alpha          D_imm 
##        0.20254        0.20254        0.20254        0.20254        0.20254 
##            rho      log.prior log.likelihood  log.posterior 
##        0.20254        0.00000        0.20254        0.20254
# close to the optimal value of 0.234

# ESS
effectiveSize(trace)
##             R0          D_lat          D_inf          alpha          D_imm 
##       5437.065       6163.158       6093.018       7106.347       6806.269 
##            rho      log.prior log.likelihood  log.posterior 
##       7453.447          0.000       5458.377       5458.377

# plot the traces
xyplot(trace)

Note that the acceptance rate and the ESS are computed for the combined chain whereas the traces are plotted for each chain. Also, given the very high ESS we can reasonably choose a burn-in visually, say 5000 iterations.

trace.burn <- burnAndThin(trace, burn = 5000)

# removing the burning increases the ESS
effectiveSize(trace.burn)
##             R0          D_lat          D_inf          alpha          D_imm 
##       5858.275       6107.597       6535.558       6887.668       6543.338 
##            rho      log.prior log.likelihood  log.posterior 
##       7163.971          0.000       5612.370       5612.370

# autocorrelation
acfplot(trace.burn, lag.max = 60)

Again, given the very high ESS, we can be quite generous in our choice of the thinning.

# Thinning: let's keep 1 iteration every 40
trace.burn.thin <- burnAndThin(trace.burn, thin = 40)
xyplot(trace.burn.thin)

However, let's compare the thinned and unthinnned distributions.

# Note that plotPosteriorDensity can take a list of mcmc.list It will plot
# the different mcmc.list by combining their elements Let's plot the
# combined unthinned trace vs the combined thinned trace.
plotPosteriorDensity(list(unthinned = trace.burn, thinned = trace.burn.thin))

In contrast to the previous short-run, they are almost no difference between the thinned and unthinned chains. Indeed, with such a long chain, the clumpy autocorrelation has been averaged out!

In fact, there are several references that show that the longer (unthinned) chain usually yields better estimates of the true posterior than the shorter thinned chain, even for percentiles in the tail of the distribution. That said, thinning can be useful for other reasons, such as memory or time constraints in post-chain processing.

Now, we can compare whether the two independent chains, started at theta1 and theta2, have converged to the same posterior distribution

densityplot(trace.burn.thin)

Since the chains have converged to the same posterior, we can use the combined estimates

# the function summary combines the chains of a mcmc.list
summary(trace.burn.thin)
## 
## Iterations = 1:2318
## Thinning interval = 1 
## Number of chains = 2 
## Sample size per chain = 2318 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                     Mean      SD  Naive SE Time-series SE
## R0               17.0909 4.30760 0.0632650      0.0721890
## D_lat             1.1458 0.35552 0.0052214      0.0057899
## D_inf            10.7516 2.23701 0.0328546      0.0349587
## alpha             0.5373 0.03823 0.0005615      0.0006162
## D_imm             7.7800 2.11592 0.0310762      0.0315506
## rho               0.6957 0.05189 0.0007621      0.0007799
## log.prior       -12.8145 0.00000 0.0000000      0.0000000
## log.likelihood -141.2702 1.71514 0.0251900      0.0282695
## log.posterior  -154.0846 1.71514 0.0251900      0.0282695
## 
## 2. Quantiles for each variable:
## 
##                     2.5%       25%       50%       75%     97.5%
## R0                9.9720   13.9366   16.6412   19.9047   26.4282
## D_lat             0.4806    0.8984    1.1447    1.3817    1.8422
## D_inf             6.4954    9.1177   10.7736   12.4811   14.6550
## alpha             0.4588    0.5122    0.5386    0.5633    0.6098
## D_imm             4.1559    6.3534    7.6108    9.0465   12.5091
## rho               0.5987    0.6601    0.6942    0.7295    0.8023
## log.prior       -12.8145  -12.8145  -12.8145  -12.8145  -12.8145
## log.likelihood -145.5656 -142.1288 -140.9693 -140.0230 -138.8926
## log.posterior  -158.3801 -154.9433 -153.7838 -152.8375 -151.7071

Running several independent chains starting from different parts of the parameter space allows us to check whether the posterior distribution is multi-modal. If so, then we must be careful when combining the chains. For instance, an estimate of the mean computed with summary won't be meaningful for a parameter with a multi-modal posterior.

By contrast, for a unimodal posteriors, combining chains is an efficient way to increase the ESS and the precision of the posterior estimates. Furthermore, running several "shorter" chains in parallel is faster than running one "long" chain.

Finally, let's assess the fit of the deterministic SEITL model.

# load data
data(FluTdC1971)

# the same init.state as for the fit
init.state <- c(S = 279, E = 0, I = 2, T = 3, L = 0, Inc = 0)

# by default plotPosteriorFit summarize the fit of 100 thetas sampled from
# the posterior
plotPosteriorFit(trace = trace, fitmodel = SEITL_deter, init.state = init.state, 
    data = FluTdC1971)


# alternatively, one can plot the fit of the mean of the posterior (in this
# case the observation is replicated 100 times)
plotPosteriorFit(trace = trace, fitmodel = SEITL_deter, init.state = init.state, 
    data = FluTdC1971, posterior.summary = "mean")


# or using the maximum a posteriori (MAP) estimate
plotPosteriorFit(trace = trace, fitmodel = SEITL_deter, init.state = init.state, 
    data = FluTdC1971, posterior.summary = "max")

Note that the 95% credible intervals (CI) for the posterior fit under the MAP captures the highest data point. By contrast, the fit of the second peak seems quite poor, even for the MAP.

You can now go back to the practical and look at the posterior correlations between the parameters.

Correlations

The correlation of the posterior distribution can be investigated using levelplot.

# levelplot doesn't accept `mcmc.list`, we pass the first `mcmc` only.
levelplot(trace.burn.thin[[1]], col.regions = heat.colors(100))

Note the strong positive correlations (~0.8) between \(R_0\) and \(D_{lat}\) and between \(R_0\) and \(D_{inf}\). In order to explain the wide 95% CIs of \(R_0\) and \(D_{inf}\), let's have a look at the contact rate \(\beta = R_0/D_{inf}\).

with(as.data.frame(trace.burn.thin[[1]]), quantile(R0/D_inf, probs = c(0.025, 
    0.25, 0.5, 0.75, 0.975)))
##     2.5%      25%      50%      75%    97.5% 
## 1.091157 1.395928 1.573898 1.771753 2.203119

The posterior value of \(\beta\) varies somewhat less than the posterior values of \(R_0\) and \(D_\mathrm{inf}\). Indeed, this parameter is constrained by the shape of the initial phase of the outbreak. Conversely, there are an infinite number of combinations of \(R_0\) and \(D_{inf}\) that lead to the same \(\beta\), hence their wide 95% CIs.

A second effect that could explain the wide posterior density in \(R_0\) is the very high attack rate. Indeed, once \(R_0>5\) it doesn't make much difference whether \(R_0\) is equal to, say, 10 or 20.

We can also note that the posterior estimate of \(D_{inf} = 11\) days (95% CI: \([6-15]\)) is biologically unrealistic based on previous empirical estimates. However, our approach did not include any prior information as the default SEITL_deter fitmodel comes with uniform priors for all parameters.

In order to include previous empirical information on \(D_{lat}\) and \(D_{inf}\), let's modify the dprior function of SEITL_deter as follows:

SEITL_deter$dprior <- function(theta) {
    
    # package with truncated normal distribution
    library(truncnorm)
    
    log.prior.R0 <- dunif(theta[["R0"]], min = 1, max = 50, log = TRUE)
    # normal distribution with mean = 2 and sd = 1 and truncated at 0
    log.prior.latent.period <- log(dtruncnorm(theta[["D_lat"]], a = 0, b = Inf, 
        mean = 2, sd = 1))
    # normal distribution with mean = 2 and sd = 1 and truncated at 0
    log.prior.infectious.period <- log(dtruncnorm(theta[["D_inf"]], a = 0, b = Inf, 
        mean = 2, sd = 1))
    log.prior.temporary.immune.period <- dunif(theta[["D_imm"]], min = 0, max = 50, 
        log = TRUE)
    log.prior.probability.long.term.immunity <- dunif(theta[["alpha"]], min = 0, 
        max = 1, log = TRUE)
    log.prior.reporting.rate <- dunif(theta[["rho"]], min = 0, max = 1, log = TRUE)
    
    return(log.prior.R0 + log.prior.latent.period + log.prior.infectious.period + 
        log.prior.temporary.immune.period + log.prior.probability.long.term.immunity + 
        log.prior.reporting.rate)
    
}

Note the choice of a truncated normal distribution since \(D_{lat}\) and \(D_{inf}\) must be positive.

You can now go back to the practical and run a MCMC with this informative prior.

Informative priors

Here we combine both chains with informative priors and compare the posterior distribution with the one above.

# create mcmc object
trace.info1 <- mcmc(mcmc_SEITL_infoPrior_theta1$trace)
trace.info2 <- mcmc(mcmc_SEITL_infoPrior_theta2$trace)

# combine in a mcmc.list
trace.info <- mcmc.list(trace.info1, trace.info2)

# burn and thin as the chain with uniform prior (see above sections)
trace.info.burn.thin <- burnAndThin(trace.info, burn = 5000, thin = 40)

# check that both chains converged to the same posterior
plotPosteriorDensity(trace.info.burn.thin)

# compare the effect of informative priors on the posterior distribution
plotPosteriorDensity(list(unif = trace.burn.thin, info = trace.info.burn.thin))

\(R_0\) and \(D_{inf}\) have very different posterior. This is expected since there is an informative prior on \(D_{inf}\) and \(R_0\) is strongly correlated to \(D_{inf}\). Note also that the mode of all other parameters have changed, though less than \(D_{inf}\) and \(R_0\). This illustrate the influence that one prior can have on the full posterior distribution.

You can now go back to the practical.

Model selection

# combine the two chains in a data frame
trace.combined <- ldply(trace.info.burn.thin)

# take the mean of theta
theta.bar <- colMeans(trace.combined[SEITL_deter$theta.names])
print(theta.bar)
##        R0     D_lat     D_inf     alpha     D_imm       rho 
## 7.6250928 1.2904394 3.6664971 0.4751491 9.1354413 0.6468514

# compute its log-likelihood
init.state <- c(S = 279, E = 0, I = 2, T = 3, L = 0, Inc = 0)
log.like.theta.bar <- dTrajObs(SEITL_deter, theta.bar, init.state, data = FluTdC1971, 
    log = TRUE)
print(log.like.theta.bar)
## [1] -142.84

# and its deviance
D.theta.bar <- -2 * log.like.theta.bar
print(D.theta.bar)
## [1] 285.6799

# the effective number of parameters
p.D <- var(-2 * trace.combined$log.likelihood)/2
print(p.D)
## [1] 8.116991

# and finally the DIC
DIC <- D.theta.bar + 2 * p.D
print(DIC)
## [1] 301.9139

Follow this link to go back to the practical.